TR - 09 - 01 Allan Porter eld , Rob Fowler , Anirban Mandal ,

نویسندگان

  • Allan Porter
  • Rob Fowler
  • Anirban Mandal
  • Min Yeol Lim
  • Allan Porterfield
چکیده

Multi-socket, multi-core computers are becoming ubiquitous, especially as nodes in compute clusters of all sizes. Common memory benchmarks and memory performance models treat memory as characterized by well-defined maximum bandwidth and average latency parameters. In contrast, current and future systems are based on deep hierarchies and NUMA memory systems, which are not easily described this simply. Memory performance characterization of multi-socket, multi-core systems require measurements and models more sophisticated than than simple peak bandwidth/minimum latency models. To investigate this issue, we performed a detailed experimental study of the memory performance of a variety of AMD multi-socket quad-core systems. We used the pChase benchmark to generate memory system loads with a variable number of concurrent memory operations in the system across a variable number of threads pinned to specific chips in the system. While processor differences had minor but measurable impact on bandwidth, the make-up and structure of the memory has major impact on achievable bandwidth. Our experiments exposed 3 different bottlenecks at different levels of the hardware architecture: limits on the number of references outstanding per thread; limits to the memory requests serviced by a single memory channel; and limits on the total global memory references outstanding were observed. We discuss the impact of these limits on constraints in tuning code for these systems, the impact on compilers and operating systems, and on future system implementation decisions. This work was supported in part by the DoD, in part by the DOE Office of Science SciDAC PERI (DE-FC02-06ER25764) and in part by NSF: Collaborative Research: The NSF Cyberinfrastructure Evaluation Center (SCI-0510267) This paper has been submitted to SIGMETRICS/Performance 09. Please limit distrubution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Multi-core Memory Concurrency Limits on Multi-threaded Applications

Memory access is becoming an increasingly significant impediment to extracting performance out of multi-core systems. More than ever, the effectiveness of memory system use by an application is becoming a critical determinant of performance. In previous work, we demonstrated how explicit consideration of memory concurrency provides a better model for memory performance on multi-socket, multi-co...

متن کامل

Empirical Evaluation of Multi - Core Memory Concurrency Initial

Multi-socket, multi-core computers are becoming ubiquitous, especially as nodes in compute clusters of all sizes. Common memory benchmarks and memory performance models treat memory as characterized by well-defined maximum bandwidth and average latency parameters. In contrast, current and future systems are based on deep hierarchies and NUMA memory systems, which are not easily described this s...

متن کامل

Adaptive Scheduling Using Performance Introspection

As energy becomes a driving force in High Performance Computing, determining when and how energy can be saved without impacting performance is a key goal for both HPC hardware and software. Scalability studies have shown that some memorybound applications do not scale as the thread count increases, and in some cases performance degrades. Adaptive Scheduling recognizes when an application is in ...

متن کامل

Performance Consistency on Multi-socket AMD Opteron Systems Performance Consistency on Multi-socket AMD Opteron Systems

Compute nodes with multiple sockets each of which has multiple cores are starting to dominate in the area of scientific computing clusters. Performance inconsistencies from one execution to the next makes any performance debugging or tuning difficult. The resulting performance inconsistencies are bigger for memory-bound applications but still noticeable for all but the most compute-intensive ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009